Search CORE

42 research outputs found

Speech, Speaker and Speaker\'s Gender Identification in Automatically Processed Broadcast Stream

Author: Nouza J.
Silovsky J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2006
Field of study

This paper presents a set of techniques for classification of audiosegments in a system for automatic transcription of broadcast programs. The task consists in deciding a) whether the segment is to be labeled as speech or a non-speech one, and in the former case, b) whether the talking person is one of the speakers in the database, and if not, c) which gender the speaker belongs to. The result of the classification is used to extend the information provided by the transcription system and also to enhance the performance of the speech recognition module. Like the most of the state-of-the-art speaker recognition systems, the proposed one is based on Gaussian Mixture Models (GMM). As the number of the database speakers can be large, we introduce a technique that speeds up the identification process in significant way. Furthermore, we compare several approaches to the estimation of GMM parameters. Finally, we present the results achieved in classification of 230 minutes of real broadcast data

Directory of Open Access Journals

DSpace@TUL

Digital library of Brno University of Technology

MAP Based Speaker Adaptation in Very Large Vocabulary Speech Recognition of Czech

Author: Cerva P.
Nouza J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2004
Field of study

The paper deals with the problem of efficient adaptation of speech recognition systems to individual users. The goal is to achieve better performance in specific applications where one known speaker is expected. In our approach we adopt the MAP (Maximum A Posteriori) method for this purpose. The MAP based formulae for the adaptation of the HMM (Hidden Markov Model) parameters are described. Several alternative versions of this method have been implemented and experimentally verified in two areas, first in the isolated-word recognition (IWR) task and later also in the large vocabulary continuous speech recognition (LVCSR) system, both developed for the Czech language. The results show that the word error rate (WER) can be reduced by more than 20% for a speaker who provides tens of words (in case of IWR) or tens of sentences (in case of LVCSR) for the adaptation. Recently, we have used the described methods in the design of two practical applications: voice dictation to a PC and automatic transcription of radio and TV news

Directory of Open Access Journals

DSpace@TUL

Digital library of Brno University of Technology

Fast Keyword Spotting in Telephone Speech

Author: Nouza J.
Silovsky J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2009
Field of study

In the paper, we present a system designed for detecting keywords in telephone speech. We focus not only on achieving high accuracy but also on very short processing time. The keyword spotting system can run in three modes: a) an off-line mode requiring less than 0.1xRT, b) an on-line mode with minimum (2 s) latency, and c) a repeated spotting mode, in which pre-computed values allow for additional acceleration. Its performance is evaluated on recordings of Czech spontaneous telephone speech using rather large and complex keyword lists

Directory of Open Access Journals

DSpace@TUL

Digital library of Brno University of Technology

Automatic Classifiers for Medical Data from Doppler Unit

Author: Klimovic T.
Malek J.
Nouza J.
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/01/2007
Field of study

Nowadays, hand-held ultrasonic Doppler units are often used for noninvasive screening of atherosclerosis in arteries of the lower limbs. The mean velocity of blood flow in time and blood pressures are measured on several positions on each lower limb. This project presents software that is able to analyze such data and classify it in real time into selected diagnostic classes. It is also capable of giving a notice of some errors encountered during measuring. At the Department of Functional Diagnostics in the Regional Hospital of Liberec a database of several hundreds signals was collected. In cooperation with the specialist, the signals were manually classified into four classes. Consequently selected signal features were extracted and used for training a distance and a Bayesian classifier. Another set of signals was used for evaluating and optimizing the parameters of the classifiers. This paper compares the results of the software with those provided by a human expert. They agreed in 89 % cases

Directory of Open Access Journals

DSpace@TUL

Digital library of Brno University of Technology

Very Fast Keyword Spotting System with Real Time Factor below 0.01

Author: J Foote
J Málek
J Nouza
X Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/07/2020
Field of study

In the paper we present an architecture of a keyword spotting (KWS) system that is based on modern neural networks, yields good performance on various types of speech data and can run very fast. We focus mainly on the last aspect and propose optimizations for all the steps required in a KWS design: signal processing and likelihood computation, Viterbi decoding, spot candidate detection and confidence calculation. We present time and memory efficient modelling by bidirectional feedforward sequential memory networks (an alternative to recurrent nets) either by standard triphones or so called quasi-monophones, and an entirely forward decoding of speech frames (with minimal need for look back). Several variants of the proposed scheme are evaluated on 3 large Czech datasets (broadcast, internet and telephone, 17 hours in total) and their performance is compared by Detection Error Tradeoff (DET) diagrams and real-time (RT) factors. We demonstrate that the complete system can run in a single pass with a RT factor close to 0.001 if all optimizations (including a GPU for likelihood computation) are applied.Comment: 11 pages, 3 figure

arXiv.org e-Print Archive

Crossref

A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users

Author: D Imseng
D Imseng
D-L Choi
DP Córdova Lucero
Ed Joode
F Rudzicz
F Rudzicz
GE Lancioni
I Kraljevski
J Borg
J Nouza
J Nouza
J Nouza
J Nouza
J Sigafoos
J Zhang
J-P Hosom
Jan Nouza
KF McCoy
L Besacier
M Bohac
M Bohac
M Bohac
M Bohac
MA Neerincx
Marek Bohac
Michaela Kucharova
MJF Gales
MJF Gales
MS Hawley
O Chia Ai
O Saz
P Lal
P Xu
P Červa
P Červa
Petr Červa
RA Wagner
SA Borrie
T Schultz
TH Falk
WK Seong
WR Rodríguez
Zoraida Callejas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related phonopathies, most approaches available are targeted to a specific pathology. This may improve their accuracy for some users, but makes them unsuitable for others. In this paper, we present a cross-lingual approach to adapt a general-purpose modular speech recognizer for learning disabled people. The main advantage of this approach is that it allows rapid and cost-effective development by taking the already built speech recognition engine and its modules, and utilizing existing resources for standard speech in different languages for the recognition of the users’ atypical voices. Although the recognizers built with the proposed technique obtain lower accuracy rates than those trained for specific pathologies, they can be used by a wide population and developed more rapidly, which makes it possible to design various types of speech-based applications accessible to learning disabled users.This research was supported by the project ‘Favoreciendo la vida autónoma de discapacitados intelectuales con problemas de comunicación oral mediante interfaces personalizados de reconocimiento automático del habla’, financed by the Centre of Initiatives for Development Cooperation (Centro de Iniciativas de Cooperación al Desarrollo, CICODE), University of Granada, Spain. This research was supported by the Student Grant Scheme 2014 (SGS) at the Technical University of Liberec

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Repositorio Institucional Universidad de Granada

DSpace@TUL

Combining Manual and Automatic Annotation of a Learner Corpus

Author: A. Díaz-Negrillo
D. Spoustová
J. Hajič
J. Nouza
S. Granger
T. Jelínek
T. Jelínek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Fully Automated Approach to Broadcast News Transcription in Czech Language

Author: D. Nejedlová
J. Nouza
J. Nouza
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Automatic Syllabification and Syllable Timing of Automatically Recognized Speech - for Czech

Author: C Yarra
GE Dahl
H Huici
J Nouza
J Nouza
J Nouza
RA Wagner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

DSpace@TUL

SPEECH AND COMPUTER Principles of speech communication, tasks, methods and applications

Author: Nouza J.
Publication venue
Publication date: 01/01/2009
Field of study

The aim of the proceedings is to supply a detailed insight into computer speech processing. The publication results in the framework of the program „Support of the targeted research“ at the AS CR, in the project „Assistance, information and communication services based on advanced voice technology“

National Repository of Grey Literature